# If the devtools package is not already installed, please run the disabled line below.
# install.packages("devtools")
devtools::install_github("tsimonato/gtaptools")gtaptools
Introduction
Welcome to the tutorial for the gtaptools package in R, which is currently under development. The primary objective of this material is to provide practical examples of using the gtaptools package for researchers during the development of research with Computable General Equilibrium (CGE) models.
The package aims to improve file management, increase the productivity, and allow do build a scriptable pipeline that promote the reproducibility. Also, the package provides tools to graphical visualizations that increase the analytical potential of the database and results. From a broader perspective, this package is part of a long-term agenda that involves the development of tools to bring the functionality and flexibility of the R language to CGE modeling, such as the HARr and TabloToR projects1.
The gtaptools package is designed to be user-friendly and is accompanied by detailed documentation and example code built-in that can be explored in Rstudio. We hope that this manual will serve as an auxiliary practical support to users and welcome community feedback and contributions.
Installation
To use the gtaptools package, it’s necessary to have R installed on your computer, which can be downloaded from here. Additionally, we recommend downloading RStudio, available at here, which provides a user-friendly interface to work with R.
You can install the development version of gtaptools from GitHub with:
Compatibility
The tools of the package mainly works with files in .har, .sl4 and other formats generated by the GEMPACK, which is widely used in CGE modeling2. As a result, some of the package’s functionalities require a certain degree of familiarity with the structure of the data used in the GEMPACK suite3. This knowledge is not necessary for users to take full advantage of the package’s capabilities in manipulating and visualizing CGE model data, but increase the learning curve.
The package also includes functions for data cleaning, manipulation, and visualization that can be used with other data format, like R data.frame and array. Also, although GEMPACK is only compatible with Windows OS, the package’s features work on other OS supported by R and Rstudio such as Linux and MacOS.
Tools
This section of the manual provides a comprehensive overview of the package’s tools organized by their individual functionalities. The functions are categorized into four topics: Tools, Data and file management, Data viz (Static and Reactive), and Report automation. Each topic is further divided into subtopics that describe the specific functions and their intended use cases. Understanding the package’s functions can help users streamline their data analysis workflows and create effective visualizations and reports.
Data/file managment
In this section, the tools that manipulate the databases are displayed. Its structure has a simple design, and the syntax adopted for the custom functions has a familiar format for GEMPACK users.
- gtaptools::har_shape
The har_shape is a efficient and valuable tool for binding databases and modifying headers in arrays or data.frames format. This function allows users combine various databases quickly and easily while generating new variables using custom calculations. This function is especially useful for integrating .har databases with other bases in the R environment, providing a versatile solution for analysts who need to work with different classes of datasets.
- Read and combines .har files, arrays and data.frames.
- Create/change headers from calculations.
- Write headers to disk.
In general, the execution of the function follows three steps:
- Create a list of the datasets that must be combined (
input_data). - Create a list with the calculations that will be executed (
new_calculated_vars). - Execute the function defining if any headers should be deleted (
del_headers), and if and where the sets (export_sets) and the numerical database (output_har_file) should be saved.
Let’s assume we want the following:
path_to_har <- gtaptools::templates("oranig_example.har")
1 input_data <- list(
path_to_har, # Path to .har database
list(
input_data = gtaptools::example_df, # Data.frame
# Description of tha header that will be created by data.frame:
header = quote(`1MAR`[c("COM", "SRC", "IND", "MAR")])
)
)
output_har_r <-
gtaptools::har_shape(
input_data = input_data,
2 new_calculated_vars = NULL,
3 output_har_file = "gtaptools_shape_example1.har"
)- 1
-
Combine a database in .har format on disk in the path
"path_to_har"with an new header1MAR[c("COM", "SRC", "IND", "MAR")]created from data.framegtaptools::example_df. - 2
- No calculation is done to generate/change headers.
- 3
-
Writes the output to
"gtaptools_shape_example1.har"and returns it the list objectoutput_har_rto R environment.
Header names that start with digits, like 1MAR, must be enclosed in `` to be properly recognized in R.
The arrays and data.frames used as input (like gtaptools::example_df in (1)) must contain columns with names that correspond to the mentioned sets and, in the case of data.frames, the column with the numerical values in the data.frame must have the same name of the header that will be generated (1MAR in this example).
Therefore, to generate 1MAR[c("COM", "SRC", "MAR")] the data.frame used has columns (categorical) that correspond to the sets and a column (numerical) that correspond to name of the header. Take a look at the first 20 lines of this data.frame below.
DT::datatable(gtaptools::example_df[1:20,])Now let’s assume another scenario.
path_to_har <- gtaptools::templates("oranig_example.har")
1 input_data <- list(
path_to_har # Path to .har database
)
2new_calculated_vars <- list(
quote(MARC["COM"] := `1MAR`), # Sum 1MAR to set COM
quote(MACM[c("COM")] := apply(`1MAR`, c("COM"), mean, na.rm = T)),
quote(MULT[c("COM", "IND")] := solve(MAKE)), # Solve the MAKE matrix
quote(NSET := c("Comm1", "Comm2")) # Create sets
)
output_har_r <-
gtaptools::har_shape(
input_data = input_data,
new_calculated_vars = new_calculated_vars,
3 del_headers = c("1LND"),
export_sets = "gtaptools_shape_example2_sets.har",
output_har_file = "gtaptools_shape_example2.har"
)- 1
- Reads a .har from disk as input.
- 2
-
Generates the
MARCandMACMheader, the aggregation by sum (default) and by mean to COM of1MAR. GeneratesMULT, with"COM"and"IND"sets, which is the inverse matrix of theMAKEheader. Creates a new set headerNSETconsisting of “Comm1” and “Comm2” elements. - 3
-
Deletes the
1LNDheader. Saves sets and numeric headers in different files.
MACM is the aggregation by mean of 1MAR. So, to aggregate by a function other than sum, the section apply(1MAR, c("COM"), mean)) was applied, which can be interpreted as an aggregation of 1MAR to "COM" applying the mean function. Note that there is still the na.rm=T argument which would not be necessary, it is just to show how other parameters of the aggregation function (mean in this case) could be specified.
Note the syntax adopted in the formulas. It is indispensable to properly use := (and not = ), [ ], c("Set1", "Set2", ...) in case of more than one set, and encapsulating the formula inside quote( ).
The aggregation performed on MARC["COM"] := 1MAR is defined by the set indicated for the output header. Since the 1MAR header is composed of 4 sets ("COM", "SRC", "IND", "MAR") and the output is composed of only 1 ("COM"), the tool automatically aggregates the output to COM by sum.
Calculations are being done between arrays in this tool, and R offers a vast range of possibilities for manipulating arrays. It can be applied, for example, if statements and intermediate aggregations of sets as commonly adopted in GEMPACK scripts.
path_to_har <- gtaptools::templates("oranig_example.har")
1 input_data <- list(
path_to_har # Path to .har database
)
new_calculated_vars <- list(
2 quote(SHMA[c("COM", "IND")] := MAKE / apply(MAKE, c("COM"), sum)),
3 quote(SH3B[c("COM", "SRC")] := `3BAS` / ifelse(apply(`3BAS`, c("COM"), sum)==0, 1, apply(`3BAS`, c("COM"), sum))),
4 quote(IF3B[c("COM", "SRC")] := `3BAS` + ifelse(`SH3B` > 0.95, `5TAX`, 0))
)
output_har <-
gtaptools::har_shape(
input_data = input_data,
new_calculated_vars = new_calculated_vars,
del_headers = NULL,
5 export_sets = F,
output_har_file = "gtaptools_shape_example3.har"
)- 1
- Reads a .har from disk as input.
- 2
-
Creates
SHMAwhich is the share ofMAKEby"COM". - 3
-
Creates
SH3Bwhich is the share of3BASin"COM"with conditional for division by zero. - 4
-
Creates
IF3Bwhich is3BASadded to5TAXifSH3B > 0.95. - 5
-
Saves the output .har file and does not include the sets in it. (
"export_sets = F").
To apply conditions on division by zero we can adopt ifelsestatements. In (2), in the case of division by zero (apply(3BAS, c("COM"), sum) == 0) the value 1 is adopted in the denominator, and apply(3BAS, c("COM"), sum) otherwise.
It is also possible to adopt cross-references, as in (4), where 3BAS is added to 5TAX only if SH3B > 0.95.
Note that the SH3B header was created and used as an input in the following formula. It is possible due to the sequential way the calculations are processed in the tool. Therefore, it is necessary to follow this sequence: create the header > use it as input.
Check the the data headers created that are being written to "gtaptools_shape_example3.har":
DT::datatable(as.data.frame.table(output_har$SHMA))DT::datatable(as.data.frame.table(output_har$SH3B))DT::datatable(as.data.frame.table(output_har$IF3B))Arrays converted to data.frame through as.data.frame.table() function have “Freq” as the name of the column of numerical values. Keep this in mind when using this type of conversion.
Another practical example is the calculation of shocks relating the .har base to external data for use as shock input file in RunDynam. Let’s assume we have information on an increase in household consumption in monetary terms of 70 billion for H01 households and 50 billion for H02 households. Let`s build a data.frame to fit this economic policy:
path_to_har <- gtaptools::templates("oranig_example.har")
input_har <-
gtaptools::har_shape(
1 input_data = path_to_har
)
set_hou <- input_har$HOU
set_years <- paste0("Y", 2015:2018)
2POL <- expand.grid(HOU = set_hou,
YEAR = set_years)
POL$POL <- 0
3POL[POL$YEAR == "Y2015", "POL"] <- c(70e9, 50e9, 0, 0, 0, 0, 0, 0, 0, 0)
DT::datatable(POL)- 1
- Reads the .har database.
- 2
-
Creates a data.frame with the
YEARandHOUsets. - 3
-
Fills the data.frame created with values in monetary units. Check the
POLproduced:
The POL data.frame stores the policy values in monetary units. We can then calculate the share between the values in POL and the household consumption represented by 3PUR[c("COM", "HOU")] in the base "oranig_example.har".
1input_data <- list(
list(
input_data = POL, # Data.frame
header = quote(POL[c("YEAR", "HOU")])
),
list(
input_data = input_har$`3PUR`, # Array
header = quote(`3PUR`[c("COM", "HOU")])
)
)
new_calculated_vars <- list(
2 quote(SHOC[c("YEAR", "HOU")] := 100*(POL/(apply(`3PUR`, "HOU", sum)*1e6)))
)
output_har <-
3 gtaptools::har_shape(
input_data = input_data,
new_calculated_vars = new_calculated_vars,
output_har_file = "gtaptools_shape_example_shock.har"
)- 1
-
Creates a list composed by the data.frame
POLand3PURof the original .har base. - 2
-
Defines the calculation of the shock. Note that
3PURis being “pre-aggregated” to"HOU"inapply(3PUR, "HOU", sum), and is being multiplied by1e6due to the monetary unit adopted in the Brazilian Input-Output Matrix used to calibrate this CGE database. - 3
-
Run the calculations and write the file
"gtaptools_shape_example_shock.har". Check the data headers on this output file:
DT::datatable(as.data.frame.table(output_har$SHOC))DT::datatable(as.data.frame.table(output_har$`3PUR`))DT::datatable(as.data.frame.table(output_har$POL))- gtaptools::agg_har
Data viz
Spatial data
- gtaptools::plot_map
The plot_map tool creates static and reactive maps with ggplot2 and leaflet packages. It requires an input_data data.frame with at least one numeric column and one categorical column with region IDs such as iso_a2, iso_a3, or iso_n3. The value_var parameter specifies the numeric variable to be plotted on the map. The region_var parameter is an optional variable that contains region labels used to aggregate the sf. The colors parameter specifies the color palette or custom color break vector to be used. Other parameters include borders_color and borders_size, to customize the color and size of the borders line. The legend_title parameter and the legend_pos parameter specify the legend’s title and position. legend_labels provides the option to replace numeric color scale labels with custom numeric or character labels. The reactive parameter should be F for static maps. fillOpacity parameter sets the color fill layer transparency. The tool supports various color palettes, such as Viridis, Color Brewer Sequential, and Color Brewer Diverging, in addition to creating a custom color palette. Check the manual for more details.
- Plot static and reactive global maps from data in data.frames.
In this way, we can summarize the implementation of this tool in three steps:
A. Create a data.frame (input_data) that relates each ISO country code to the column with the numerical values (value_var) that will be plotted.
B. Customize the legend elements if needed.
C. Customize color elements if needed.
The data.frame indicated in input_data, and the spatial object sf are related using the iso_a2, iso_a3, and iso_n3 match vectors. Therefore, input_data must contain at least one column with the name and content consistent with these ISO country codes to match correctly. Although the user will likely provide the region aggregation and labeling variable (region_var) via input_data, some vectors are also available built-in.
DT::datatable(gtaptools::template_map[c("iso_a2", "iso_a3", "iso_n3")])sf <- rnaturalearth::ne_countries(scale = "small", returnclass = "sf")
sf <- sf::st_drop_geometry(sf)
DT::datatable(sf[1:20,])Colors are not plotted for regions that value_var are not numerical or are NA.
Check out some usage examples:
gtaptools::plot_map(
1 input_data = gtaptools::template_map,
2 value_var = "gdp_pc",
3 region_var = "name",
4 colors = "viridis",
5 legend_title = "GDP per capita 2021, PPP</br>(constant 2017 international $)"
)- 1
- A data.frame with at least 1 column to match “iso_a2”, “iso_a3” or “iso_n3”.
- 2
- Numerical variable to be plotted.
- 3
- Region variable for spatial aggregation and labeling.
- 4
- Color palette.
- 5
-
Note that in the case of an interactive map, we can use HTML commands in legend title like
</br>to break the line.